Imagine that a business wants to locate its offices in the Bronx to take advantage of NYC incentives and relatively low rents. The operation is 24/7, so some workers will be coming to and going from work late at night. Are there any areas so unsafe that the business should avoid these neighborhoods?
Open NYC Data has NYPD complaint data for the past few months at https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-YTD/5uac-w243/data, We will use tidyverse to read and inspect the data structurally and visually. A simple lat and lon (latitude and longitude) scatter plot will indicate light and dark areas for us to inspect the data.
library(tidyverse)
## -- Attaching packages --------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggmap)
library(stringr)
library(viridis)
## Loading required package: viridisLite
#Load data
bx_crime <- read_csv("bx-crime.csv")
## Parsed with column specification:
## cols(
## .default = col_character(),
## CMPLNT_NUM = col_integer(),
## CMPLNT_FR_TM = col_time(format = ""),
## CMPLNT_TO_TM = col_time(format = ""),
## KY_CD = col_integer(),
## PD_CD = col_integer(),
## ADDR_PCT_CD = col_integer(),
## X_COORD_CD = col_integer(),
## Y_COORD_CD = col_integer(),
## lat = col_double(),
## lon = col_double()
## )
## See spec(...) for full column specifications.
# Inspect data
bx_crime %>% glimpse()
## Observations: 101,671
## Variables: 23
## $ CMPLNT_NUM <int> 764775390, 725800636, 581626843, 641549773, ...
## $ CMPLNT_FR_DT <chr> "12/31/2017", "12/30/2017", "12/30/2017", "1...
## $ CMPLNT_FR_TM <time> 01:00:00, 12:50:00, 12:05:00, 11:15:00, 10:...
## $ CMPLNT_TO_DT <chr> "12/31/2017", "12/30/2017", "12/30/2017", "1...
## $ CMPLNT_TO_TM <time> 09:00:00, 12:52:00, 12:15:00, 11:30:00, 10:...
## $ RPT_DT <chr> "12/31/2017", "12/30/2017", "12/30/2017", "1...
## $ KY_CD <int> 109, 109, 110, 344, 105, 578, 344, 359, 359,...
## $ OFNS_DESC <chr> "GRAND LARCENY", "GRAND LARCENY", "GRAND LAR...
## $ PD_CD <int> 421, 415, 441, 101, 386, 638, 101, 750, 748,...
## $ PD_DESC <chr> "LARCENY,GRAND FROM VEHICLE/MOTORCYCLE", "LA...
## $ CRM_ATPT_CPTD_CD <chr> "COMPLETED", "COMPLETED", "COMPLETED", "COMP...
## $ LAW_CAT_CD <chr> "FELONY", "FELONY", "FELONY", "MISDEMEANOR",...
## $ JURIS_DESC <chr> "N.Y. POLICE DEPT", "N.Y. POLICE DEPT", "N.Y...
## $ BORO_NM <chr> "BRONX", "BRONX", "BRONX", "BRONX", "BRONX",...
## $ ADDR_PCT_CD <int> 44, 40, 40, 47, 42, 40, 42, 44, 47, 46, 47, ...
## $ LOC_OF_OCCUR_DESC <chr> NA, "FRONT OF", "FRONT OF", "FRONT OF", "FRO...
## $ PREM_TYP_DESC <chr> "STREET", "STREET", "STREET", "STREET", "STR...
## $ PARKS_NM <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ HADEVELOPT <chr> NA, NA, NA, NA, NA, NA, NA, "CLAREMONT REHAB...
## $ X_COORD_CD <int> 1004152, 1007668, 1006075, 1019171, 1009372,...
## $ Y_COORD_CD <int> 238742, 237200, 236932, 266070, 241649, 2381...
## $ lat <dbl> 40.82195, 40.81771, 40.81698, 40.89691, 40.8...
## $ lon <dbl> -73.92809, -73.91540, -73.92115, -73.87369, ...
ggplot() +
geom_point(data = bx_crime, aes(x = lon, y = lat), alpha = .05)
we need a satellite map of the Bronx and will use ggmap to download and view the map.
map_bx <- get_map("Bronx", zoom = 12, maptype = 'satellite')
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Bronx&zoom=12&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Bronx&sensor=false
ggmap(map_bx)
Again using ggmap, based on ggplot2, we use the stat_density2d function to view an overlay of the incidents scatterplot onto the Google satellite map. Titles, labels, density colors (courtesy of viridis and the inferno color option), along with theme formatting complete the picture.
p <- ggmap(map_bx) +
stat_density2d(data = bx_crime, aes(x = lon, y = lat,
fill = ..density..), geom = "tile", contour = F, alpha = .5) +
scale_fill_viridis(option = "inferno") +
labs(title = str_c("BX has largest concentration of crime\n"
,"near Fordham Rd., Morrisania, and the South Bronx"
),
subtitle = str_c("2017-18 source:", "\nhttps://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-YTD/5uac-w243/data"),
fill = str_c("Number of", "\ncrime incidents")
) +
theme(text = element_text(color = "#444444"),
plot.title = element_text(size = 13, face = 'bold'),
plot.subtitle = element_text(size = 8),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()
) +
guides(fill = guide_legend(override.aes= list(alpha = 1)))
p
## Warning: Removed 1 rows containing non-finite values (stat_density2d).
Not quite yet! There are neighborhoods to probably avoid and suggested by the clusters of high density incidents. Our next stop might be a visit to the Economic Development Corporation for data about economic and commercial zones in the Bronx.